44 research outputs found
Distributed linear regression by averaging
Distributed statistical learning problems arise commonly when dealing with
large datasets. In this setup, datasets are partitioned over machines, which
compute locally, and communicate short messages. Communication is often the
bottleneck. In this paper, we study one-step and iterative weighted parameter
averaging in statistical linear models under data parallelism. We do linear
regression on each machine, send the results to a central server, and take a
weighted average of the parameters. Optionally, we iterate, sending back the
weighted average and doing local ridge regressions centered at it. How does
this work compared to doing linear regression on the full data? Here we study
the performance loss in estimation, test error, and confidence interval length
in high dimensions, where the number of parameters is comparable to the
training data size. We find the performance loss in one-step weighted
averaging, and also give results for iterative averaging. We also find that
different problems are affected differently by the distributed framework.
Estimation error and confidence interval length increase a lot, while
prediction error increases much less. We rely on recent results from random
matrix theory, where we develop a new calculus of deterministic equivalents as
a tool of broader interest.Comment: V2 adds a new section on iterative averaging methods, adds
applications of the calculus of deterministic equivalents, and reorganizes
the pape
Regularity Properties for Sparse Regression
Statistical and machine learning theory has developed several conditions
ensuring that popular estimators such as the Lasso or the Dantzig selector
perform well in high-dimensional sparse regression, including the restricted
eigenvalue, compatibility, and sensitivity properties. However, some
of the central aspects of these conditions are not well understood. For
instance, it is unknown if these conditions can be checked efficiently on any
given data set. This is problematic, because they are at the core of the theory
of sparse regression.
Here we provide a rigorous proof that these conditions are NP-hard to check.
This shows that the conditions are computationally infeasible to verify, and
raises some questions about their practical applications.
However, by taking an average-case perspective instead of the worst-case view
of NP-hardness, we show that a particular condition, sensitivity, has
certain desirable properties. This condition is weaker and more general than
the others. We show that it holds with high probability in models where the
parent population is well behaved, and that it is robust to certain data
processing steps. These results are desirable, as they provide guidance about
when the condition, and more generally the theory of sparse regression, may be
relevant in the analysis of high-dimensional correlated observational data.Comment: Manuscript shortened and more motivation added. To appear in
Communications in Mathematics and Statistic